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Variation in chromatin accessibility in human kidney 
cancer links H3K36 methyltransferase loss 
with widespread RNA processing defects 
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Comprehensive sequencing of human cancers has identified recurrent mutations in genes encoding chromatin regulatory 
proteins. For clear cell renal cell carcinoma (ccRCQ, three of the five commonly mutated genes encode the chromatin 
regulators PBRM1, SETD2, and BAPI. How these mutations alter the chromatin landscape and transcriptional program in 
ccRCC or other cancers is not understood. Here, we identified alterations in chromatin organization and transcript 
profiles associated with mutations in chromatin regulators in a large cohort of primary human kidney tumors. By as- 
sociating variation in chromatin organization with mutations in SETD2, which encodes the enzyme responsible for 
H3K36 trimethylation, we found that changes in chromatin accessibility occurred primarily within actively transcribed 
genes. This increase in chromatin accessibility was linked with widespread alterations in RNA processing, including intron 
retention and aberrant splicing, affecting -25% of all expressed genes. Furthermore, decreased nucleosome occupancy 
proximal to misspliced exons was observed in tumors lacking H3K36me3. These results directly link mutations in SETD2 to 
chromatin accessibility changes and RNA processing defects in cancer. Detecting the functional consequences of specific 
mutations in chromatin regulatory proteins in primary human samples could ultimately inform the therapeutic application 
of an emerging class of chromatin-targeted compounds. 



[Supplemental material is available for this article.] 

Large-scale cancer sequencing studies continue to identify muta- 
tions in genes encoding chromatin regulatory proteins in a wide 
variety of human cancers. The downstream molecular conse- 
quences of these mutations, however, remain unknown. Clear cell 
renal cell carcinoma (ccRCC) is a particularly relevant model for 
the study of chromatin regulation in cancer for several reasons. 
First, relative to mutations in other classes of genes, ccRCCs are 
marked by frequent mutation of chromatin regulators (Dalgliesh 
et al. 2010; Varela et al. 2011; Pena-Llopis et al. 2012; Ryan and 
Bernstein 2012). Three of the more commonly mutated genes in 
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ccRCC include chromatin modifiers SETD2, PBRM1, and BAPI 
(Dalgliesh et al. 2010; Varela et al. 2011; Pena-Llopis et al. 2012; 
Kapur et al. 2013), suggesting that alterations at the level of chro- 
matin may play a prominent role in the development of ccRCC 
(Dalgliesh et al. 2010; Varela et al. 2011). Mutation-associated 
changes in chromatin organization may promote oncogenesis 
in novel ways, and it has been suggested that specific chromatin 
regulator mutations may confer differences in patient survival or 
associate with more advanced disease (Hakimi et al. 2012). How- 
ever, the downstream effect of these mutations on tumor chro- 
matin biology remains unknown. Second, this cancer is tightly 
associated with a distinct transcriptional program resulting from 
the inactivation of the von Hippel-Lindau (VHL) tumor suppressor 
gene (Kim and Kaelin 2004; Bratslavsky et al. 2007; Nickerson et al. 
2008; Jonasch et al. 2012). The loss of VHL results in the stabiliza- 
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tion of hypoxia inducible factors (HIFs), transcription factors that 
activate a complex program of downstream targets, including vas- 
cular endothelial growth factor (VEGF) and other genes (Gordan 
et al. 2008; Gore and Larkin 2011; Jonasch et al. 2012). Third, 
besides VHL and chromatin regulators, mutations in other cancer- 
associated pathways are generally absent from ccRCC tumors. 

Elucidating the functional consequences of mutations in 
genes encoding chromatin regulatory proteins on chromatin or- 
ganization and transcription in human tumor specimens requires 
the application of techniques developed for cultured cells to pri- 
mary human tissues. Formaldehyde-assisted isolation of regulatory 
elements (FAIRE) interrogates chromatin accessibility by isolating 
nucleosome-depleted regions of DNA (Nagy et al. 2003; Hogan 
et al. 2006; Giresi et al. 2007; Giresi and Lieb 2009; Simon et al. 
2012). These regions harbor regulatory elements such as active 
transcriptional start sites, transcriptional enhancers, insulators, 
silencers, and locus control regions (Hogan et al. 2006; Giresi et al. 
2007; Giresi and Lieb 2009; Gaulton et al. 2010; Song et al. 2011; 
Simon et al. 2012). As a component of the ENCODE Project, FAIRE 
has been used to identify regulatory elements across a wide range 
of cell lines (Song et al. 2011; Thurman et al. 2012). However, the 
application of FAIRE to primary human tissue or to explore the 
association between chromatin and genetic alterations in cancer 
has yet to be evaluated. 

We modified FAIRE for use on primary human clinical sam- 
ples to define the chromatin landscape in a large cohort of ccRCC 
tumors and matched normal tissues. We identified tumor- and 
normal-kidney-specific classes of chromatin accessibility changes, 
as well as those associated with chromatin modifier mutations. We 
focused our study on SETD2, which trimethylates lysine-36 on 
histone H3 (H3K36me3) (Rayasam et al. 2003; Sun et al. 2005; 
Brown et al. 2006; Edmunds et al. 2008; Yoh et al. 2008; Duns et al. 
2010). Associated with the RNA polymerase II complex, SETD2- 
dependent methylation tends to occur toward the 3' ends of genes 
and over nucleosomes located at exons (Edmunds et al. 2008; 
Kolasinska-Zwierz et al. 2009; Schwartz et al. 2009). SETD2 and 
H3K36me3 seem to play a role in cotranscriptional RNA process- 
ing. In cell-culture-based studies, silencing of SETD2 or readers of 
H3K36me3 has been associated with differential exon inclusion 
for individual genes (Luco et al. 2010; Pradeepa et al. 2012) and 
alternative transcription start site utilization (Carvalho et al. 2013). 
However, the consequence of SETD2 deficiency on chromatin 
organization and RNA processing remains to be explored on a ge- 
nome-wide scale and in a disease-relevant model. SETD2 is mu- 
tated in —12% of primary human ccRCC tumors and results in 
H3K36me3 deficiency (Gerlinger et al. 2012). A similar rate of 
SETD2 mutation has also been observed in high-grade gliomas 
(Fontebasso et al. 2013). A recent study of intratumor heteroge- 
neity in ccRCC identified distinct SETD2 mutations in all sub- 
sections of the same tumor suggesting the importance of disrupt- 
ing SETD2 function for a subset of tumors (Gerlinger et al. 2012). 

We found that SETD2 mutation was associated with chro- 
matin accessibility differences preferentially in gene bodies, and 
these genes frequently exhibited RNA processing defects. Nearly 
25% of all expressed genes demonstrated aberrancies in splicing, 
including exon skipping, intron retention, and alternative tran- 
scription start and termination sites. We observed that misspliced 
exons were marked by a striking increase in chromatin accessibility 
immediately upstream of the aberrant splice and a loss of nucleo- 
some occupancy directly over the exon. This study represents 
the first investigation of chromatin organization in human tumors 
to identify the impact of chromatin modifier mutations on the 



genomic landscape. Understanding chromatin dysregulation in 
cancer may ultimately inform the application of emerging classes 
of chromatin-targeted small molecules in renal cancer. 

Results 

Differences in chromatin accessibility between tumors 
and normal kidney tissue corroborate the underlying 
role of HIF in ccRCC 

We performed FAIRE-seq on 42 primary ccRCC tumor samples as 
well as uninvolved matched normal kidney from seven of these 
patients (Supplemental Fig. S1A,B). We identified about 11,000 
500-bp genomic intervals with differences in chromatin accessi- 
bility that discriminated tumors from normal kidney (two-sided 
t-test,P< 0.01) (Fig. 1A,B). For -70% of these regions, FAIRE signal 
was increased in the tumor samples, indicative of nucleosome 
depletion. Using hierarchical clustering, three clusters of genomic 
loci emerged: Two were marked by tumor-specific nucleosome 
depletion (Clusters 1 and 2), and another was characterized by 
nucleosome depletion in normal kidney tissue but not in tumors 
(Cluster 3). Virtually all tumors exhibited nucleosome depletion at 
the sites in Cluster 1, whereas —50% of tumors demonstrated 
FAIRE enrichment at regions in Cluster 2. 

We then examined each cluster for shared biological associ- 
ations among the loci and adjacent genes. Regions in each cluster 
were associated with genes (GREAT) (McLean et al. 2010)). For sites 
in Cluster 1, 2274 genes were identified, many of which were 
members of several cancer-associated gene sets. Particularly in- 
teresting in the setting of ccRCC, where HIF transcription factor 
family stabilization and activation of hypoxia response genes is a 
central feature of this tumor type, we found that the most signif- 
icantly associated genes in this cluster were involved in HIF activa- 
tion and hypoxia regulation (Fig. 1C; full list of associations for each 
cluster in Supplemental Fig. S2). This association was not observed 
for regions in Cluster 2 or 3 (Supplemental Fig. S2). Analysis of the 
sequences in Cluster 1 identified several highly enriched transcrip- 
tion factor (TF) motifs (Heinz et al. 2010), including the hypoxia 
response element consensus binding sequence (Fig. ID). We addi- 
tionally found that previously identified HIF1A and HIF2A (EPAS1) 
binding sites (Schodel et al. 2011) only significantly overlapped loci 
in Cluster 1 (P < 0.001) (Fig. 1B,E). The detection of features associ- 
ated with the hypoxia response through variation in chromatin 
accessibility is consistent with the unique link between HIF activity 
and ccRCC, and these results demonstrate the ability of FAIRE to 
detect central biological pathways through the identification varia- 
tions in chromatin organization in an unbiased fashion. 

SETD2 mutations link H3K36me3 loss with changes 
in chromatin accessibility 

To identify mutations in chromatin modifiers within tumor sam- 
ples, we genotyped 33 unique ccRCC tumors (from our cohort of 
42 above) and the same seven matched normal kidney specimens 
(Supplemental Fig. S1A,B). We classified sequence variants based 
on predicted ability to confer severe protein structural changes, 
including frameshift, nonsense, and mutations altering an anno- 
tated splice site ("high severity"), as well as missense mutations 
("moderate severity") (Fig. 2A). Approximately half of the SETD2 
mutations in these classes were predicted to disrupt the catalytic 
SET domain. High- and moderate-severity mutations were also 
observed in other domains in SETD2 including the SRI domain, 
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Figure 1. Regions of tumor-specific nucleosome eviction identify the underlying role of HIF in ccRCC. (A) Hierarchical clustering of median-centered 
FAIRE signal in windows with significant differences between tumors and normal kidney (two-sided t-test, P< 0.01). (8) FAIRE-seq tracks for ccRCC (black) 
and uninvolved kidney (red) at two loci. (Blue) ChlP-seq signals (Schodel et al. 201 1) from HIF1A, HIF2A, and ARNT. (C) The top five Gene Ontology 
associations (q< 1 x 10~ 5 ) with sites in Cluster 1 are shown. (D) Transcription factor binding motifs enriched in Cluster 1 compared with local background 
500-bp flanking windows (>2. 5-fold over background and present in at least 1 0% of the Cluster 1 windows). P-values relative to local background are 
shown. (£) Fraction of HIF1 A and HIF2A binding sites (Schodel et al. 201 1) that overlap the loci in Clusters 1, 2, and 3 compared with permuted controls. 
Errors bars represent standard deviation (SD). 



which mediates the interaction with RNA polymerase II. A pre- 
diction of copy number using the genotyping data also revealed 
that with the exception of one tumor (which displayed one highl- 
and one moderate-severity mutation) loss of heterozygosity co- 
incided with mutations in SETD2 (Supplemental Fig. S4C). 

We identified about 7000 500-bp windows exhibiting signif- 
icant variation in FAIRE enrichment between S£TD2-mutant and 



S£TD2-normal tumors (two-sided t-test, P < 0.01) (Fig. 2B; Sup- 
plemental Fig. SIC). In the S£TD2-mutant tumors, FAIRE signal at 
these regions was most commonly increased (80%), suggesting 
that SETD2 loss is preferentially associated with greater chromatin 
accessibility. SETD2 trimethylates H3K36 typically at gene bodies 
(Barski et al. 2007; Kolasinska-Zwierz et al. 2009). Regions with 
increased FAIRE signal in S£TD2-mutated tumors (one-sided t-test, 
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Figure 2. SETD2 mutations link H3K36me3 loss with changes in chromatin accessibility. (A) Sche- 
matic representation of SETD2 mutations predicted to have high or moderate severity on protein 
structure. (6) Hierarchical clustering of median-centered FAIRE signal in windows with significant dif- 
ferences between SETD2 mutant tumors (red) and tumors without SETD2 mutation (gray) (2-sided 
t-test, P < 0.01). (White) Samples not genotyped. (C) Proportions of nucleosome-depleted loci over- 
lapping H3K36me3-marked regions compared with loci with permuted genomic coordinates. Error bars 
represent SD. (D) Representative immunostaining of two ccRCC tumor-normal pairs on the tissue 
microarray. (£) Quantification of H3-normalized H3K36me3 intensity across 1 1 normal kidney and 69 
renal tumors. Mutation severity (high, red; moderate, green; none, blue) is indicated. (White) Samples 
with unknown SETD2 mutation status. The threshold for H3K36me3 deficiency was set to the lowest 
observed intensity in normal tissue (dashed line). 



P < 0.01) also overlapped gene bodies (49% of sites), most of which 
(91%) were marked by H3K36me3 in normal kidney (P < 0.001 
relative to permuted control). More specifically, regions of in- 
creased chromatin accessibility associated with SETD2 mutation 
were enriched directly over the same domains marked by H3K36me3 
(24.5%, P< 0.001 relative to permuted control) (Fig. 2C). In contrast, 
the regions with decreased FAIRE signal showed no association 
with H3K36me3 and, in fact, showed a significant underrep- 
resentation relative to permuted control (P < 0.001). As an addi- 
tional control, we tested for this association at regions with in- 



creased FAIRE signal in PBRM1 -mutant 
tumors, which we expected to yield a di- 
vergent set of loci. Indeed, areas of in- 
creased chromatin accessibility associated 
with this mutation were significantly un- 
derrepresented at H3K36me3-marked re- 
gions {P < 0.001 relative to permuted 
control). Together, these data indicate 
that regions of nucleosome depletion as- 
sociated with SETD2 mutation preferen- 
tially occur at genie sites normally marked 
by H3K36me3. 

Although SETD2 is responsible for 
trimethylation of H3K36, other mecha- 
nisms may influence H3K36 methylation 
status. Moreover, the effects of specific 
classes of SETD2 mutations in human 
tumors on H3K36 methylation in RCC 
are not known. We quantified H3K36me3 
on a tissue microarray (69 tumors, 11 
matched normal kidneys) (Fig. 2D; Sup- 
plemental Fig. SIB). Whereas normal 
kidney samples demonstrated consistent 
nuclear H3K36me3 signal (Fig. 2E), tu- 
mors displayed a range of staining in- 
tensity, with 53% of tumors exhibiting 
reduced H3K36me3 intensity. Hereafter, 
this group of tumors is referred to as 
"H3K36me3 deficient." Each of the eight 
tumors that contained mutations pre- 
dicted to affect SETD2 activity and screened 
by IHC demonstrated H3K36me3 de- 
ficiency (Fig. 2E). Tumors containing 
mutations before the SET domain (Q320fs, 
E978*, and Q1409*) displayed a complete 
loss of H3K36me3 signal. However, tumors 
with SETD2 mutations located within 
the SET domain (G1681fs) or in the SRI 
domain (R2510L) displayed reduced 
H3K36me3 signal, suggesting that some 
mutations may cause a partial loss of 
function. Several tumors (eight of 13) 
without identified SETD2 mutations also 
exhibited reduced H3K36me3 signal. 
SETD2 was undetectable by immuno- 
histochemistry in two of these tumors, 
whereas others exhibited decreased SETD2 
mRNA, suggesting alternate mechanisms 
for H3K36me3 loss (Supplemental Fig. S4). 
We also observed evidence for SETD2 
gene hemizygosity in other H3K36me3- 
deficient 5£TD2-normal tumors, suggest- 
ing that loss of heterozygosity may contribute to deficiency 
in H3K36 methylating activity (Supplemental Fig. S4C). Inter- 
estingly, one tumor (Tumor 25 in Supplemental Fig. S4C) did not 
exhibit a copy number loss, carried two SETD2 mutations (E1846*, 
high severity; I2499S, moderate severity), and showed a moderate 
H3K36me3 deficiency (an intensity value of 0.36 in Fig. 2E). We 
would thus predict that at least one of these mutations is hypo- 
morphic, thus explaining the intermediate magnitude of the 
H3K36me3 deficiency. Similarly, we detected two mutations in 
SETD2 in another tumor (Tumor 3 in Supplemental Fig. S4C), 
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which exhibited a global loss in H3K36me3 staining along with 
copy number loss. These data suggest that either the tumor cell 
population was heterogeneous and the remaining allele was differ- 
entially mutated in each population (as was observed in Gerlinger 
et al. 2012) or that the one remaining allele was mutated in two 
locations. Together, these data illustrate that defective H3K36me3 
is a common feature of ccRCC and that the SETD2 genotype alone 
underestimates H3K36me3 deficiency. 



SETD2 mutation is associated with DNA hypomethylation 
proximal to sites of nucleosome depletion 

In many higher eukaryotes, the H3K36me3 mark is recognized by 
several chromatin readers, one of which is the PWWP domain of 
the DNA methyltransferase DNMT3A, resulting in DNA methyla- 
tion proximal to the marked histone (Dhayalan et al. 2010). Using 
ccRCC DNA methylation data from The Cancer Genome Atlas 
(TCGA), we observed localized changes (P < 0.05) in DNA meth- 
ylation, primarily (>70% of probes) DNA 
hypomethylation, in S£TD2-mutant tu- 
mors of the TCGA data set at nucleo- ^ 
some-depleted regions identified in our 
S£TD2-mutant tumors (Supplemental 
Fig. S5). These data link changes in DNA 
methylation to sites of nucleosome evic- 
tion and/or loss of H3K36me3 through 
SETD2 mutation. This result underscores 
the importance of H3K36me3 and how 
its loss may confer a multifaceted alter- 
ation in the epigenome. 



from 0 to 1, where a score of 0 represents a completely spliced 
message and a score of 1 represents uniform genie coverage. Intron 
retention was dramatically increased in the H3K36me3-deficient 
tumors at 95% of the transcripts (6551 in total) marked by 
H3K36me3 in normal kidney and by nucleosome depletion (one- 
sided f-test, P < 0.01) in H3K36me3-deficient tumors (Fig. 3A,C; 
Supplemental Figs. SIB, S6A, S7). To confirm this result, we per- 
formed ChlP-seq from an independent normal kidney sample 
(Supplemental Fig. S6B). Of the 6551 transcripts initially identified 
(Fig. 3A; Supplemental Fig. SI), 6101 were identified using the 
second normal kidney sample, representing a 93% overlap. When 
the 6551 transcripts were instead stratified by PBRM1 mutation 
status, widespread intron retention was not observed (Fig. 3B), 
suggesting that this effect is specific to tumors with H3K36me3 
deficiency. Many of the affected genes are part of recognized can- 
cer-associated pathways, including known tumor suppressors (e.g., 
MET, PTEN, and TPS3), genes in the DNA repair pathway (e.g., ATR, 
RAD50, POLN, and XRCC1), and cell cycle regulators (e.g., CCNB1 
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Intron retention and splicing defects 
affect a large fraction of genes 
with altered chromatin accessibility 
in SETD2 mutant tumors 

H3K36me3 has been previously impli- 
cated in RNA processing (Luco et al. 2010; 
de Almeida et al. 2011; Pradeepa et al. 
2012), an association not previously ex- 
amined in primary tissues or in a disease- 
relevant model. We thus hypothesized 
that H3K36me3 deficiency would alter 
RNA processing and splicing in tumors 
specifically at genes with altered chromatin 
accessibility. To assess total RNA, including 
pre-mRNA and nonpolyadenylated tran- 
scripts, we performed RNA-seq on ribo- 
some-depleted RNA from 33 tumors, all 
but one of which was annotated with 
mutational status (Fig. 2; Supplemental 
Fig. SI); six tumors without H3K36me3 
status assessed by immunohistochemistry 
were omitted. We observed that H3K36me3- 
deficient tumors displayed a striking en- 
richment of intronic pre-mRNA signal 
compared with tumors with normal 
H3K36me3 levels. To quantify this effect, 
we calculated intron retention scores 
(IRS), which reflect the ratio of intronic to 
exonic RNA-seq reads on a gene-by-gene 
basis for each tumor. IRS values range 
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Figure 3. H3K36me3 deficiency is associated with intron retention. Intron retention scores for se- 
lected genes (Supplemental Fig. SIC) were compared between (A) H3K36me3-deficient tumors and 
H3K36me3-normal tumors, and (8) P6/!M7-mutant and P8fiM7-normal tumors. (C) Example genes 
exhibiting increased intron retention in H3K36me3-deficient tumors (fop, PPP2CB; bottom, COX6Q. 
Intron retention scores, genie coverage (calculated with both intron and exon reads), and exonic 
coverage (calculated only with exonic reads) are provided for two H3K36me3-deficient tumors (red) 
and two H3K36me3-normal tumors (black). 
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and CCND3), as well as numerous receptors and protein kinases 
(e.g., BRAF, EGFR, PIK3CA, and TGFBR3) (Supplemental Fig. S8). 

Widespread RNA processing defects linked with SETD2 
mutations persist in the mature RNA pool and are marked 
by altered chromatin accessibility 

To test whether observed changes in the pre-mRNA messages 
persist into mature polyadenylated RNA, we analyzed TCGA RNA- 



seq data derived from poly(A) + mRNA isolated from a large cohort 
(n = 416) of ccRCC tumors. Applying a gene-model-independent 
algorithm for read mapping and transcript prediction (Singh et al. 
2011), we observed that S£TD2-mutant tumors exhibited signifi- 
cant alterations in transcript processing (3929 transcripts) (Fig. 
4A,B). Alterations included intron retention (12% of altered tran- 
scripts) (Supplemental Figs. SIB, S9), variation in exon utilization 
(66% of altered transcripts) (Fig. 4B,C; Supplemental S10), and 
differences in transcriptional start and termination site usage (22% 
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of altered transcripts). We also observed the generation of pre- 
viously unannotated splice isoforms, which we validated by 
quantitative PCR in independent tumor samples (Fig. 4D). Aber- 
rancies in RNA processing were detected more frequently in highly 
expressed genes (Supplemental Fig. SUA). Low-abundance mes- 
sages may preclude the detection of differences in transcript pro- 
cessing. However, overall expression of genes exhibiting defects in 
RNA processing was comparable between S£TD2-mutant and 
S£TD2-normal tumors (Supplemental Fig. SI IB). 

Since H3K36me3 preferentially marks well-positioned exonic 
nucleosomes (Edmunds et al. 2008; Kolasinska-Zwierz et al. 2009; 
Schwartz et al. 2009), we analyzed chromatin accessibility around 
the intron-exon boundary of misspliced exons. H3K36me3-nor- 
mal tumors demonstrated an expected reduction of FAIRE signal 
immediately downstream from intron/exon junctions as well as 
a concomitant enrichment in H3K36me3 (from ChlP-seq in nor- 
mal kidney), indicative of a well-positioned exonic nucleosome 
(Fig. 4E, left), corroborating previous reports (Kolasinska-Zwierz 
et al. 2009). Strikingly, in H3K36me3-deficient tumors, evidence of 
the exonic nucleosome was lost, and a dramatic increase in chro- 
matin accessibility was observed immediately upstream (50 bp) of 
the intron/exon junction (Fig. 4E, left, red line). This pattern was 
also evident, although less pronounced, at internal exon start sites 
of random genes (Fig. 4E, middle) but completely absent at random 
genie positions (Fig. 4E, right). Changes in chromatin accessibility 
even at internal exons chosen regardless of whether they exhibited 
a splicing defect may indicate a more widespread defect that may 
not always result in detectable variation in splicing. These data 
demonstrate the ability to detect subtle variations in chromatin 
organization in primary human tumors and link H3K36me3 loss 
with alterations in chromatin accessibility at exons. 

Discussion 

To identify the genomic consequences of mutated chromatin 
regulators, we modified and applied FAIRE-seq to a large cohort of 
primary kidney tumors. Using an unbiased approach, we identified 
variation in chromatin accessibility distinguishing tumors from 
normal kidney. Tumor-specific open chromatin corresponded to 
HIF-targeted sites and was linked to genes involved in the hypoxia 
response. This result reflects the well-studied association of ccRCC 
development with VHL inactivation and HIF stabilization. These 
data also serve to validate the use of FAIRE in primary tumors to 
detect biologically meaningful pathways. 

We then associated variation in chromatin accessibility with 
mutations in chromatin regulators. Focusing on SETD2, we ob- 
served widespread increases in chromatin accessibility, especially 
in gene bodies typically harboring H3K36me3 in normal kidney 
tissue. A recent report suggested that SETD2 silencing in cultured 
cells results in alternative internal transcriptional start sites 
(Carvalho et al. 2013), akin to cryptic initiation observed in yeast 
(Carrozza et al. 2005; Lickwar et al. 2009). Our data using human 
tumor specimens support a more diverse model for transcriptional 
defects, including retention of introns, missplicing of exons, and 
usage of alternative transcriptional start or end sites. These defects 
were widespread, affecting nearly 25% of all expressed genes, and 
defects were more common in highly transcribed genes. 

Moreover, we found a surprising increase in chromatin ac- 
cessibility immediately upstream (50 bp) of misspliced exons in 
S£TD2-mutated tumors. This result suggests a mechanism by which 
the altered inclusion of the downstream exon is related to nucle- 
osome positioning over the exon itself as well as the adjacent up- 



stream nucleosome. Nucleosome positioning and histone modi- 
fications (including H3K36me3) are known to regulate multiple 
processes involved with splicing, including changes in the speed or 
pausing of RNA polymerase (Kadener et al. 2001; Howe et al. 2003; 
Batsche et al. 2006; Kornblihtt 2007; Munoz et al. 2009), and the 
ability for splicing machinery to appropriately recognize the splice 
donor and acceptor. Our finding also suggests that the positioning 
of this upstream nucleosome may be related to trimethylation of 
H3K36 on the exonic nucleosome. In Saccharomyces cerevisiae, loss 
of Set2 leads to destabilization of nucleosomes through hyper- 
acetylation of gene bodies and cryptic transcriptional initiation 
(Carrozza et al. 2005; Lickwar et al. 2009). Since hyperacetylation 
was not observed following SETD2 silencing (Edmunds et al. 
2008), the increased chromatin accessibility we observed over gene 
bodies may therefore represent nucleosome destabilization in a 
hyperacetylation-independent manner. Although our results di- 
rectly link SETD2 mutation and H3K36 trimethylation to chro- 
matin accessibility, studies that specifically examine nucleosome 
positioning and histone modification will be necessary to fully 
investigate this potential mechanism. 

Although our data associate SETD2 mutations/H3K36me3 
deficiency with aberrant RNA processing, exactly how this dysre- 
gulation contributes to tumorigenesis remains unknown. A signifi- 
cant fraction of the deregulated transcripts include known tumor 
suppressors, DNA damage response proteins, and kinases. Strik- 
ingly, 58% of genes with altered splicing patterns (Fig. 4A,B) en- 
code annotated phosphoproteins (P= 7.3 x 10~ 109 ), representing 
an enrichment exceeding that of genes annotated as having al- 
ternate splice isoforms (P = 2 x 10~ 60 ), a finding also observed in 
genes exhibiting retained introns (Supplemental Figs. S8, S12). 
Alterations in the abundance, stability, or splicing of RNA could 
induce changes in the phosphoproteome and disrupt normal cel- 
lular signaling and growth checkpoints, leading to tumorigenesis. 
Deregulated signaling as well as transcriptional defects provide 
numerous putative targets for therapeutic exploitation. Addition- 
ally, the application of FAIRE, or IHC for H3K36 trimethylation, 
could enable the classification of clinical specimens into func- 
tional tumor subtypes. 

This study advances our understanding of the relationship 
between genetic alterations affecting chromatin organization 
and alterations in transcription. RNA processing defects in a large 
fraction of expressed genes, many of which are tumor suppressors 
critical for cellular function, may be a common phenotype of 
many cancers. Comprehensive mapping of the chromatin land- 
scape in primary tumors offers a new tool for understanding the 
functional consequences of chromatin modifier mutations in hu- 
man disease. 



Methods 

Formaldehyde-assisted isolation of regulatory elements 
(FAIRE) and hierarchical clustering of differentially 
open chromatin 

FAIRE was performed as previously described (Simon et al. 2012). 
Sequencing was performed using 36- or 50-bp single-end reads 
(Illumina GA IIx or HiSeq 2000). Reads were filtered using TagDust 
(Lassmann et al. 2009) and aligned to the reference human ge- 
nome (hgl9) with Bowtie (Li and Durbin 2009) using default pa- 
rameters. Reads were counted in 500-bp sliding windows across the 
genome, normalized for sequencing depth, and adjusted for batch 
effects using Principal Components Analysis (PCA). One outlier 
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normal kidney sample was removed at this step, and all normal 
kidney samples were removed for subsequent tumor-only analy- 
ses. For clustering analyses, only windows with sufficient se- 
quencing depth (row average > 0.25) were retained; groups were 
compared using one- or two-sided f-tests (P < 0.01), clustered and 
plotted (Saldanha 2004). Feature intersections were computed us- 
ing BEDTools (Quinlan and Hall 2010). 

Reprocessing of HIF1A, HIF2A, and ARNT ChlP-seq data 

ChlP-seq reads for HIF1A, HIF2A, and ARNT (Schodel et al. 2011) 
were filtered using TagDust (Lassmann et al. 2009) and aligned to 
the reference human genome (hgl9) using Bowtie (Li and Durbin 
2009) requiring unique read placement. Binding sites for HIF1A 
and HIF2A were identified using MACS (Feng et al. 2012), with 
a shift-size of 250 bp and significant to q < 0.05. 

Ontologies associated with differentially open chromatin 

Regions from Clusters 1-3 were associated with Gene Ontologies 
using GREAT (McLean et al. 2010) using all possible 500-bp win- 
dows as background. The top five ontologies with q < 1 x 10~ 5 were 
presented; full Gene Ontology associations are supplied in Sup- 
plemental Figure S3. 

Motif analysis 

Significantly overenriched known transcription factor (TF) motifs 
were identified using HOMER (Heinz et al. 2010). The 500-bp 
flanking region was used as local background. Only those TF motifs 
whose enrichment over background exceeded 2.5-fold were pres- 
ent in at least 10% of the target sequence, and q < 0.0001 were 
presented in Figure ID. Highly similar entries were merged. 

SureSelect custom capture and mutation calling 

Genotyping was performed using the SureSelect XT Custom Cap- 
ture (Agilent). Multiplexing was achieved using TruSeq adapters 
(Illumina); samples were pooled prior to the capture and amplified 
post-capture using TruSeq PCR primers (Illumina). Blocking re- 
agents were replaced with water to avoid cross-reactivity. Se- 
quencing was performed using 50-bp paired-end reads (Illumina 
HiSeq 2000). Reads were aligned to the reference human genome 
(hgl9) using BWA (Li and Durbin 2009). Genes were sequenced to 
an average coverage of 200x with 85% of the target sequenced to 
least at 50x . Genotypes were determined using the Genome Analysis 
Toolkit (GATK) (McKenna et al. 2010) "Better" protocol. Only high- 
confidence (quality score >100) variants predicted to have high or 
moderate severity and not reported in dbSNP (vl29) were considered. 

Histone methylation ChlP-seq and data processing 

ChIP for H3K36me3 and input DNA from normal kidney was se- 
quenced on the Illumina Genome Analyzer II. Reads were aligned 
to the reference genome (hgl9) using Bowtie requiring unique 
alignment. H3K36me3 sites were called first using ZINBA (Rashid 
et al. 2011), then merged to call broader domains by merging 
nearby sites using Galaxy (Goecks et al. 2010); two or more sites 
within 5 kb were merged. The average H3K36me3 signal across 
gene bodies was plotted using CEAS (Shin et al. 2009). 

Feature overlap permutations 

Significance of overlap between sites of differentially open chro- 
matin associated with SETD2 or PBRM1 mutations and H3K36me3 
sites was determined by permutation. First, the overlap between 



the actual set of significant windows and histone methylation was 
computed. Then the same number of randomly selected windows 
from the full list (regardless of significance) was selected 1000 
times, and an empirical P-value was determined by counting the 
number of times the overlap of the permuted set exceeded that of 
the actual set. 



Tissue microarray construction and immunohistochemistry 

Tissue microarrays (TMAs) were constructed from formalin-fixed, 
paraffin-embedded tumor blocks from 69 ccRCC tumors and 11 
matched normal kidneys collected at the time of nephrectomy. He- 
matoxylin and eosin-stained slides were reviewed to identify a target 
area of ccRCC histology in each tissue block. TMAs were then 
constructed using 0.6-mm cores on the manual tissue microarrayer 
(Beecher Instruments). Tumor and normal samples were represented 
in triplicate. Sequential 4-|xm slides were cut from each TMA. 

Immunohistochemical (IHC) staining of H3K36me3, histone 
H3, and SETD2 was performed (Bond Autostainer, Leica Micro- 
systems, Inc.) according to the manufacturer's protocol. Antigen 
retrieval for H3K36me3, SETD2, and histone H3 was performed for 
30 min in citrate buffer (pH 6.0) (Bond #AR9961) and hydrated 
with Bond wash buffer (AR9590). Slides were incubated with 
H3K36me3 antibody (Abeam, ab9050, dilution 1:2000) or histone 
H3 (courtesy of the Strahl laboratory, dilution 1:5000) or SETD2 
(Abeam, ab31358, 1:200) for 1 h at room temperature. Antibody 
detection was performed (Bond Polymer Refine Detection System, 
DS9800) followed by image acquisition (ScanScope CS, Aperio 
Technologies). 

Quantification of H3K36me3, SETD2, and histone H3 was 
performed independently by two reviewers who were blinded to 
the tissue identity. The percentage of tumor cells with positive 
nuclei was determined by evaluating the entire core for each 
sample. The degree of H3K36me3 or SETD2 staining was averaged 
across triplicate samples and normalized to total H3 to correct for 
differences in cell number. Using the minimum value of normal- 
ized H3K36me3 in normal kidney as a cutoff, tumors were strati- 
fied as either "H3K36me3-normal" or "H3K36me3-deficient" for 
subsequent analyses. Five additional tumors (not represented on 
the tissue microarray) were similarly assessed by immunohisto- 
chemistry and classified as "H3K36me3-deficient" (three tumors) 
or "H3K36me3-normal" (two tumors). 

Intron retention estimates by RNA-seq 

Total RNA was isolated from tumors (miRNeasy, Qiagen) and val- 
idated to have a median RNA Integrity Numbers (RIN) of 8.6 
(minimum 6.8) using a Bioanalyzer (Agilent). Ribosomal RNA was 
depleted (RiboMinus, Invitrogen) and RNA was fragmented (RNA 
Fragmentation Reagents, Ambion). cDNA was generated (Super- 
Script II, Invitrogen) by random priming followed by second 
strand synthesis (DNA polymerase I, Enzymatics) and purified 
(PCR purification kit, Qiagen) . Libraries were prepared according to 
the manufacturer's specifications (Illumina). Sequencing was per- 
formed using 50-bp single-end reads (Illumina HiSeq 2000). Reads 
were aligned to the reference human genome (hgl9) using TopHat 
(Trapnell et al. 2009), and gene expression was estimated by cal- 
culating RPKM, analyzing only exonic reads. Intron retention 
scores were calculated for each gene as follows: 

£ intronic coverage 
^ x Y. intronic length 

£ exonic coverage £ intronic coverage ' 
£ exonic length £ intronic length 
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Table 1. 


Quantitative RT-PCR primer sequences 


Gene 


Direction 


Primer sequences (5' to 3 ) 


ABCF1 


Forward 


CGCCAAGCCATGTTAGAAAATG 




Reverse 


CG G CTAC AATGTAC AG GTCTG 


USH1C 


Forwardl 


ACCATCTCCAAACCTGTCATG 




Forward2 


ATG ATCAG G G AGTG G AACC 




Reverse 


CCATCCTCTTCAACATCTCCTG 



Quantitative RT-PCR 

Total RNA extracted from patient tumors (Qiagen miRNeasy) was 
either rRNA-depleted (RiboMinus, Invitrogen) or poly(A)-selected 
(Oligotex mRNA Mini Kit, Qiagen) . RNA was reverse-transcribed by 
random priming (Superscript II Reverse Transcriptase, Invitrogen), 
and cDNA was quantified by PCR and normalized to ABCF1 
(Maxima SYBR Green/ROX qPCR Master Mix, Thermo Fisher Sci- 
entific; 7900HT Fast Real-Time PCR System, Applied Biosystems) 
(see Table 1). 

Differential splicing analysis 

RNA-seq reads were aligned to the human reference genome using 
MapSplice (Wang et al. 2010). For each gene, a splicing graph was 
created as previously described (Singh et al. 2011). Each exon and 
splicing event was represented as an edge, and splice junctions 
as nodes. We computed a "splicing fraction" of each edge as the 
fraction of RNA-seq coverage in that edge divided by the total 
coverage of all edges sharing one node of that edge. Only edges 
with coverage exceeding 5 reads and genes with multiple isoforms 
(13,879 genes) were considered. The node exhibiting the largest 
difference between SETD2 mutant and normal tumors was deter- 
mined by comparing the median of each group. As a control, we 
created random groups of tumors of the same sizes. Splicing dif- 
ferences between S£TD2-mutant and normal tumors were com- 
pared with that of the control group by a Kruskal-Wallis one-way 
analysis of variance test. The skipped exon ratio was computed as 
the ratio of coverage of the included exon, and the sum of cover- 
ages of the included exon and the skipping splice. 

Data acquisition 

TCGA data were accessed with authorization. Nephrectomy spec- 
imens were collected under institutional IRB-approved protocols. 

Data access 

Sequencing data and mutational analysis files have been submitted 
to EMBL-EBI ArrayExpress (http://www.ebi.ac.uk/arrayexpress/) un- 
der accession number E-MTAB-1936. 
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